Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

نویسندگان

چکیده

Discriminative training techniques define state-of-the-art performance for automatic speech recognition systems. However, they are inherently prone to overfitting, leading poor generalization when using limited data. In order address this issue, paper presents a full Bayesian framework account model uncertainty in sequence discriminative of factored TDNN acoustic models. Several learning based variant systems proposed the over weight parameters and choices hidden activation functions, or layer outputs. Efficient variational inference approaches as few one single parameter sample ensure their computational cost both evaluation time comparable that baseline Statistically significant word error rate (WER) reductions 0.4%-1.8% absolute (5%-11% relative) were obtained 900 h speed perturbed Switchboard corpus trained LF-MMI system multiple regularization methods including F-smoothing, L2 norm penalty, natural gradient, averaging dropout, addition i-Vector plus unit contribution (LHUC) speaker adaptation RNNLM rescoring. The efficacy is further demonstrated comparison against on same task most recent hybrid end-to-end reported literature. Consistent improvements also 450-h HKUST conversational Mandarin telephone task. On third cross domain requiring rapidly porting 1000-h LibriSpeech data small DementiaBank elderly corpus, outperformed direct fine-tuning by up 2.5% WER reduction.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modular Construction of Time-Delay Neural Networks for Speech Recognition

Several strategies are described that overcome limitations of basic network models as steps towards the design of large connectionist speech recognition systems. The two major areas of concern are the problem of time and the problem of scaling. Speech signals continuously vary over time and encode and transmit enormous amounts of human knowledge. To decode these signals, neural networks must be...

متن کامل

Multi-State Time Delay Neural Networks for Continuous Speech Recognition

Alex Waibel Carnegie Mellon University Pittsburgh, PA 15213 [email protected] We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. Unlike most other hybrid methods. the MS-TDNN embeds an alignment search procedure into the connectionist architecture. and allows for word level supervision. The resulting system has the ability to ma...

متن کامل

Phoneme recognition using time-delay neural networks

In this paper we present a Time-Delay Neural Network (TDNN) approach to phoneme recognition which is characterized by two important properties. 1) Using a 3 layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically using error backpropagation 111. 2) Th...

متن کامل

Efficient computation of MMI neural networks for large vocabulary speech recognition systems

This paper describes, how to train Maximum Mutual Information Neural Networks (MMINN) in an efficient way, with a new topology. Large vocabulary speech recognition systems, based on a Hybrid MMI/connectionist HMM combination, have shown good performance on several tasks [1] and [2]. MMINNs are trained to maximize the mutual information between the index of the winning output neuron (Winner-Take...

متن کامل

A new hybrid system based on MMI-neural networks for the RM speech recognition task

We present a hybrid speech recognition system for speaker independent continuous speech recognition. The system combines a novel information theory based neural network (NN) paradigm and discrete Hidden Markov models (HMMs) including State-of-the-Art techniques like state clustered triphones. The novel NN type is trained by an algorithm based on principles of self-organization that achieves max...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3069080